-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DatabricksTaskOperator #40013
Add DatabricksTaskOperator #40013
Conversation
2fb3bb0
to
058d082
Compare
cc: @tatiana for helping review the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e26fa40
to
c5c3d34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great, @pankajkoti , thanks for all the refactoring and improvements!
I added a minor comment on the documentation.
This pull request introduces the [DatabricksTaskOperator](https://github.com/astronomer/astro-provider-databricks/blob/main/src/astro_databricks/operators/common.py#L26) to the Airflow Databricks provider from the [astro-provider-databricks](https://github.com/astronomer/astro-provider-databricks/tree/main) repository. Unlike the `DatabricksNotebookOperator` which only allows to run Notebook tasks, the `DatabricksTaskOperator` allows running all kinds of tasks across a wide range of types including notebooks, JAR files, Python scripts, Databricks SQL queries and dashboards, Delta Live Tables pipelines, dbt tasks and more that are supported by the [Jobs API in its tasks attribute](https://docs.databricks.com/api/workspace/jobs/create#tasks). It marks another pull request aimed at contributing operators and features from that repository into the Airflow Databricks provider. This PR also abstracts the common implementation between the `DatabricksNotebookOperator` and `DatabricksTaskOperator` into a base abstract class `DatabricksTaskBaseOperator` so that we do not introduce duplicate code for the common set of implementation methods.
This pull request introduces the [DatabricksTaskOperator](https://github.com/astronomer/astro-provider-databricks/blob/main/src/astro_databricks/operators/common.py#L26) to the Airflow Databricks provider from the [astro-provider-databricks](https://github.com/astronomer/astro-provider-databricks/tree/main) repository. Unlike the `DatabricksNotebookOperator` which only allows to run Notebook tasks, the `DatabricksTaskOperator` allows running all kinds of tasks across a wide range of types including notebooks, JAR files, Python scripts, Databricks SQL queries and dashboards, Delta Live Tables pipelines, dbt tasks and more that are supported by the [Jobs API in its tasks attribute](https://docs.databricks.com/api/workspace/jobs/create#tasks). It marks another pull request aimed at contributing operators and features from that repository into the Airflow Databricks provider. This PR also abstracts the common implementation between the `DatabricksNotebookOperator` and `DatabricksTaskOperator` into a base abstract class `DatabricksTaskBaseOperator` so that we do not introduce duplicate code for the common set of implementation methods.
This pull request introduces the [DatabricksTaskOperator](https://github.com/astronomer/astro-provider-databricks/blob/main/src/astro_databricks/operators/common.py#L26) to the Airflow Databricks provider from the [astro-provider-databricks](https://github.com/astronomer/astro-provider-databricks/tree/main) repository. Unlike the `DatabricksNotebookOperator` which only allows to run Notebook tasks, the `DatabricksTaskOperator` allows running all kinds of tasks across a wide range of types including notebooks, JAR files, Python scripts, Databricks SQL queries and dashboards, Delta Live Tables pipelines, dbt tasks and more that are supported by the [Jobs API in its tasks attribute](https://docs.databricks.com/api/workspace/jobs/create#tasks). It marks another pull request aimed at contributing operators and features from that repository into the Airflow Databricks provider. This PR also abstracts the common implementation between the `DatabricksNotebookOperator` and `DatabricksTaskOperator` into a base abstract class `DatabricksTaskBaseOperator` so that we do not introduce duplicate code for the common set of implementation methods.
As part of Astronomer's internal plans and decisions, we've decided to contribute the existing functionality provided by the operators and plugins in this repository to the official Apache Airflow Databricks provider. To achieve this, we submitted the following PRs to the Airflow provider: 1. apache/airflow#39178 2. apache/airflow#39771 3. apache/airflow#40013 4. apache/airflow#40724 5. apache/airflow#39295 All functionality has now been contributed to the Airflow Databricks provider, and ongoing support will be maintained there. As a result, we're deprecating the operators and plugins in this repository. Users are encouraged to transition to the official Apache Airflow Databricks provider as soon as possible. The migration process is straightforward—simply update the import path to point to the Airflow provider and ensure that you install `apache-airflow-providers-databricks>=6.8.0`, which includes all the contributions mentioned above. closes: astronomer/issues-airflow#715
This pull request introduces the DatabricksTaskOperator
to the Airflow Databricks provider from the astro-provider-databricks
repository. Unlike the
DatabricksNotebookOperator
which onlyallows to run Notebook tasks, the
DatabricksTaskOperator
allowsrunning all kinds of tasks across a wide range of types including
notebooks, JAR files, Python scripts, Databricks SQL queries
and dashboards, Delta Live Tables pipelines, dbt tasks and more
that are supported by the Jobs API in its tasks attribute.
It marks another pull request aimed at contributing operators
and features from that repository into the Airflow Databricks
provider. This PR also abstracts the common implementation
between the
DatabricksNotebookOperator
andDatabricksTaskOperator
into a base abstract classDatabricksTaskBaseOperator
so that we do not introduceduplicate code for the common set of implementation methods.
Example successful run of a DAG using the
DatabricksTaskOperator
with the implementation from this PR^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.